Statistical properties of sampled networks 
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We study the statistical properties of the sampled scale-free networks, deeply related to the proper 
identification of various real-world networks. We exploit three methods of sampling and investigate 
the topological properties such as degree and betweenness centrality distribution, average path 
length, assortativity, and clustering coefficient of sampled networks compared with those of original 
networks. It is found that the quantities related to those properties in sampled networks appear to 
be estimated quite differently for each sampling method. We explain why such a biased estimation of 
quantities would emerge from the sampling procedure and give appropriate criteria for each sampling 
method to prevent the quantities from being overestimated or underestimated. 



I. INTRODUCTION 

Recently, a huge amount of research on complex net- 
works has been achieved in interdisciplinary fields in- 
cluding mathematics, statistical physics, computer sci- 
ence, sociology, biology, etc. [J HI, Complex net- 
works are ubiquitous in the real world, e.g., there are 
technological networks such as the Internet [j], biolog- 
ical networks such as protein interaction networks [5[, 
and social networks such as scientific collaboration net- 
works 0]. Various models to explain the observed prop- 
erties of those real networks have been introduced and 
studied by both numerical and analytic approaches. Rel- 
atively fewer works, however, have been done about 
possible error or bias in collecting data and identify- 
ing real networks in a practical sense, and most works 
deal exclusively with either social networks or the Inter- 
net BSSEaElEiEiEIEEElES. 

For instance, a survey of relationships among partic- 
ipants has to be conducted to construct a social net- 
work, but the collected network data may be incomplete 
or erroneous since a survey usually targets only a par- 
tial sample of a whole population [9(. The topology of 
the Internet is inferred by aggregating paths or tracer- 
outes flc f, w hich also reveals only a part of the whole 
Internet [id [Til . El, EH ■ In biology, protein-protein in- 
teraction networks are identified by seeking contextual or 
cellular functions mostly within specific functional mod- 
ules Identification of such networks by experiments 
also has a fundamental limit naturally. Thus, all these 
networks identified are sampled networks from complete 
structures. In addition, if the size of an entire network 
is too large to measure some quantities such as between- 
ness centrality fl9l . |20| due to time complexity, inevitably 
a sampling process is necessary. 

So far models of networks have been designed based 
on features observed in real networks, such as the small- 
world effect [2l| and the power-law degree distribu- 
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tion [221 . [23|. But what if those observed characteris- 
tics from the sampled networks are considerably differ- 
ent from the original structures of the real networks? It 
has been shown that the sampled networks based on the 
traceroute sampling method may have significantly dif- 
ferent topological properties from the original network in 
some cases [3 El, El El- Effects of missing data in 
social networks are discussed in Ref . Q , in which it was 
shown that some problems in conceiving social networks 
can cause incompleteness of data and lead to misesti- 
mation of quantities like mean node degree, clustering 
coefficient, assortativity, etc. At this point, bias in such 
quantities needs to be considered in a more general sense. 

In a statistical sense, inference from a sample provides 
fairly reasonable estimation of a whole population if a 
large number of objects are selected randomly enough 
to be representative in the population. This naive crite- 
rion, however, cannot be applied directly to sampling net- 
works, since there are two different elements, i.e., nodes 
and links in a network. A degree distribution of nodes 
is, for example, a statistic of a network, but the degree 
is not an independent characteristic of each node. Nodes 
are literally connected to one another, by the other kind 
of components called links from which a degree is defined. 
Similarly, other properties of a network also heavily de- 
pend on the way that nodes and links are interwoven. 
There could be several different ways of sampling net- 
works due to the two interrelated elements (nodes and 
links), and each method may give distinctive features 
with respect to such properties. 

There has been a large amount of work on random 
breakdowns or intentional attack on complex networks, 
considered as the exact reverse pro cess of sampling, in 
the physics community [U l25l. l2d The analytic 

methods in that work, therefore, can be also applied to 
the sampling problem. In this paper, we adopt three ba- 
sic methods of sampling networks and investigate the ef- 
fect of each method on measuring several well-known net- 
work quantities such as degree distribution, average path 
length, betweenness centrality distributio n [13 . |2dj . as- 
sortativity [28] , and clustering coefficient [2l|. Observed 
bias of such quantities is explained, and we provide ap- 
propriate criteria for choosing sampling methods to mea- 
sure the quantities more accurately. Some typical real 
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networks as well as the Barabasi-Albert model [22| are 
sampled for this analysis. More general sampling pro- 
cesses used to identify real networks may consist of some 
combinations of methods presented here or variations of 
them, but we can infer by using the results from the basic 
methods. 



II. SAMPLING METHODS AND NETWORKS 

We introduce three kinds of sampling method called 
node sampling, link sampling, and snowball sampling. 
In node sampling, a certain number of nodes are ran- 
domly chosen and links among them are kept. The sam- 
pling fraction in this method is defined as the ratio of 
the number of chosen nodes (including isolated nodes 
that will be removed later) to that of all the nodes in 
the original network. As in Fig. QJa) , isolated nodes are 
neglected for convenience, although they are fully pre- 
dictable, so the number of nodes in a sampled network is 
a little bit less than that of selected nodes. We observe 
the dependence of the number of chosen links on that 
of nodes, since it is related to the average degrees and 
average path length of sampled networks, discussed later 
on. Suppose the fraction of number of selected nodes is 
a and that of links among them is f3. Then it is found 



that P 



if we pick nodes randomly, since the maxi- 



mum number of (undirected) links possible for n selected 
nodes are (™) = n(n - l)/2 ~ n 2 [29[. 

In link sampling, a certain number of links are ran- 
domly selected and nodes attached to them are kept, as 
in Fig.QJb). In snowball sampling [3(|[3l|], we first choose 
a single node and all the nodes directly linked to it are 
picked. Then all the nodes connected to those picked in 
the last step are selected, and this process is continued 
until the desired number of nodes are sampled. The set 
of nodes selected in the nth step is denoted as the nth 
layer, in the same sense of "radius" for ego-centered net- 
works in Ref. {3l|. See Fig. Q] (c) for illustration. To 
control the number of nodes in the sampled network, a 
necessary number of nodes are randomly chosen from the 
last layer. Similar to the cluster-growing method used to 
calculate the fractal dimension of percolation clusters in 
Ref. [12], the snowball sampling method tends to pick 
hubs (nodes with many links) in short step due to high 
connectivity of them. So whether the initial node is a 
hub or not does not make a noticeable difference in char- 
acterizing the sampled network. 

For numerical analysis of the sampled networks, we 
use Barabasi-Albert (BA) scale-free network as an ex- 
ample of model networks, which follows the power-law 
degree distribution p(k) ~ fc -3 , with 30000 nodes and 
mo = m = 4 [23] • We also consider three real- world net- 
works from various fields, including protein interaction 
network (PIN) @, [3l, the Internet at the autonomous 
systems (AS) level [34], and e- print archive coauthorship 
network (arxiv.org) [6]. The numbers of nodes and links 
for each network are in Table |TJ Although results from 




3rd layer 



2nd layer 



1st layer 



initial node 

(a) node sampling (b) link sampling (c) snowball sampling 

FIG. 1: Three kinds of sampling method, (a) Node sampling: 
Select the circled nodes, keep three links among them, and the 
isolated node is removed, (b) Link sampling: Select the three 
circled links and six nodes attached to them, (c) Snowball 
sampling: Starting from the circled node, select nodes and 
links attached to them by tracing links. 



Network 


n 


I 


Ref. 


PIN 


5077 


16449 


[5, 33] 


Internet AS 


10515 


21455 


[34] 


arxiv.org 


49983 


245300 


[6] 



TABLE I: The numbers of nodes n and links / for each real 
network. 



other homogeneous networks are also discussed in Sec. 
IV, most of networks considered in this work are undi- 
rected and scale- free networks following power-law degree 
distribution, p(k) ~ fc~ 7 , where 2 < 7 < 3. 



III. CHARACTERISTICS OF SAMPLED 
NETWORKS 

A. Degree distribution and average path length 

A degree of a node is defined as the number of links 
attached to the node. Many real networks are shown to 
have a power-law degree distribution p(k) ~ k~^ QSH, 
including the networks considered in this paper. We 
found that in general degree distributions of sampled net- 
works from the four networks obtained by all three meth- 
ods follow the power-law as well as those of the original 
networks. The exponents of degree distribution 7 (de- 
gree exponent) are extracted using maximum likelihood 
estimate given by the formula [35| 



7 = 1 



(i) 



where n is the number of elements in a set {hi} whose 
elements follow the power-law distribution p(k) ~ k ' , 
and k min is the smallest element for which the power-law 
behavior holds. Figure shows the change of the degree 
exponent for the sampled networks from each network 
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FIG. 2: Changes of degree exponent 7 for each network's sam- 
pled networks according to the sampling fraction a, averaged 
over ten independent realizations. Empty squares (□) stand 
for node sampling, filled squares (■) for link sampling, and 
empty triangles (A) for snowball sampling. The horizontal 
dashed lines are the values for the original exponent of each 
network, and the solid lines represent the values obtained by 
Eq. ©. 



obtained by numerical simulation for each method as we 
change the sampling fraction a. 

For node sampling, we fix the number of sampled nodes 
and select nodes randomly. In this case, the new degree 
distribution p'(k) of the sampled network is expressed as 
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p'(k)= > p(k ) 



kn — k 



-ko-1 
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n-l 

n.. - 1 



(2) 



where p(k) is the degree distribution of the original net- 
work, n is the number of nodes in the original network, 
and n s = an is the fixed number of sampled nodes. In 
the case that the number of nodes in sampled networks 
is not fixed but only the probability a with which indi- 
vidual nodes are selected is given 

mini, E( i- © sh ° uid 

be written as 



P'(k)= J2 



n-l 

£ 

fc+l ko—k 



p(k )f(n s ) 



-k -l 



(3) 



where the probability that n s number of nodes are chosen 
is f(n s ) = ( r ")o! Tls (l — a) n ~ ns . If the number of nodes 
is fixed, f(n s ) — S(ti s — an) and Eq. ([3]) becomes Eq. ([2]) 
with n s — an. Even if the number of nodes is not fixed, 
when the system size is large enough to use the approx- 
imation f(n s ) ~ S(n s — an), we can safely use Eq. 
Equation ^ can be further reduced by n\/{n — m)\ ~ n m 
for n 3> ni. Suppose n, n s 3> fcrj, Then 
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FIG. 3: Degree distribution for sampled networks of (a) C. 
elegans neural network with a — 120/297 and (b) Zachary 
karate club network with a = 20/34, obtained from the node 
sampling. Empty circles are simulation results from 1000 sam- 
pling processes. Solid lines correspond to Eq. ([2]) and dashed 
lines to Eq. ([SJ. Insets show the part of large degrees, where 
the difference between two formulae is prominent, for each 
graph. 



which leads to the formula previously used in Refs. 

MM 



p\k)=jr P {k )( k °\a k {l-a) k °- k . 

k =k ^ ' 



(5) 



The sizes of all the four networks studied in this paper 
are larger than 5000, and we have checked that Eqs. (|2|) 
and ([5]) give practically the same values of p'ik) and 
are indistinguishable in the graphs. For much smaller 
networks, on the other hand, Eq. ([2]) actually predicts 
the degree distribution of sampled networks better than 
Eq. ([5]). In Fig. [H we compare the simulation results 
for two small networks, the nematode C. elegans neu- 
ral network (37j with 297 nodes and 2359 links and the 
Zachary karate club network [38( with 34 nodes and 77 
links, with those two equations by substituting the orig- 
inal degree distribution p(k) = n^/n, where n^ is the 
number of nodes with degree k. The figure clearly shows 
that Eq. ^ is more accurate. 

The above equations turn out to be applied to the link 
sampling with the same sampling fraction a as well. Here 
we can use the technique in Ref. [36[ to solve the bond 
percolation or epidemic model. Suppose a node, which 
originally had fco links before sampling, comes to have k 
links. Because the random link sampling chooses links 
uniformly, the probability of the node having k out of 
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degree in the original network 

FIG. 4: Change of nodes' degree in BA network for snowball 
sampling. The sampling fraction is 10000/30000. 



k links is p(k\k ) = ( k £)a k (l ~ a) k °~ k . Consequently 
the probability that a node in the sampled network has 
degree k from all the possible original degree ko is p'(k) = 
' =k P(ko)p(k\k ) , which leads us back to Eq. ©. The 
fact that those two sorts of sampling are described by 
the same equation is also supported by Fig. [2] showing 
the similar degree exponent changes for both node and 
link sampling. 

As Stumpf et al. point out in Refs. [TUGS], Eq. © 
for a power-law degree distribution p(k) ~ fc -7 yields 
deviation of p'(k) from the original power-law form for 
quite small sampling fraction a. For moderate values 
of a, however, the deviation is not significant and we 
observe that the tangent of p'(k) in the log- log plot ac- 
tually becomes steeper from Eq. (|5j), consistent with our 
numerical observation about the node and link sampling 
as shown in Fig. [2l To extract the degree exponents from 
Eq. (O, first we calculate the degree distribution of the 
original networks by p(k) = n^/n. Substituting that p(k) 
into Eq. ((5|), we obtain the degree distribution p'(k) of 
the sampled networks corresponding to a given sampling 
fraction a. The degree exponents from those p'{k) in 
Fig. [2] show good agreement with the values from numer- 
ical simulation for both node and link sampling cases. 

On the contrary, it is found that a degree exponent de- 
creases for snowball sampling as we decrease the sampling 
fraction. By the definition of snowball sampling, hubs are 
more likely to be selected by this method. Furthermore, 
once a hub is picked, every node connected to the hub is 
selected in the next step unless it belongs to the previous 
layer. This characteristic of snowball sampling tends to 
conserve the degrees of easily selected hubs, which leads 
to the decrease of degree exponents by holding the "tail" 
of the power-law degree distribution. Figure [4] shows the 
degrees in a sampled network obtained by snowball sam- 
pling, and the nodes with large degree on the y = x line 
clearly indicates a tendency to choose hubs and conserve 
their degrees. Therefore, the snowball sampling under- 
estimates the degree exponent. In Ref. [ll|, they show 
that the traceroute sampling can underestimate the de- 
gree exponent of a scale-free network by undersampling 
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FIG. 5: Changes of APL for each network's sampled networks 
according to the ratio £ of the size of giant component in the 
sampled networks to that of the original ones, averaged over 
ten independent realizations. Empty squares (□) stand for 
node sampling, filled squares (■) for link sampling, and empty 
triangles (A) for snowball sampling. The horizontal dashed 
lines are the values for the original APL of each network, and 
the other lines are guides to the eyes. 



the low-degree nodes relative to the high-degree ones. In 
spite of the difference between the snowball and tracer- 
oute sampling, both of these methods overrepresent hubs 
and have the same "crawling" character used to identify 
the nodes. We infer that the decrease of degree exponents 
for both sampling methods is caused by these similar fea- 
tures. 

We also check two closely related quantities, namely 
the average degree and the average path length (APL) 
in the sampled networks. APL is the average of shortest 
paths between all the pairs of nodes in a network, often 
used as a measure of network efficiency. In Fig. [5l we 
present APL of the giant component in the sampled net- 
works obtained by the numerical simulation. For snow- 
ball sampling, APL decreases according to the decreased 
system size of sampled networks. On the other hand, 
for node and link sampling, the APL of a sampled net- 
work is larger than that of the original network for not- 
so-small sampling fraction, even though the size of the 
sampled network itself is smaller than the original one. 
As presented earlier, for node sampling, the number of 
links is proportional to the square of the number of the 
nodes, which leads to (fc) = 21 /n oc n, where I and n 
are the numbers of links and nodes in a sampled net- 
work, respectively. This suggests that the average degree 
in a sampled network decreases as the sampling fraction 
becomes smaller. Obviously, for a given network, APL 
decreases as the average degree increases. The dimin- 
ishmcnt in the average degree, therefore, seems to have a 
stronger effect on APL than the overall system size in this 
case. Similar behavior of the average degree and APL is 
observed for link sampling, but in this case it seems that 
the "treelike" structure of sampled networks, related to 
the clustering coefficient discussed later, is responsible 
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for that behavior. 



B. Betweenness centrality distribution 

Bctweenness centrality (BC or load), which measures 
the centrality of a node by the traffic flow in a network, 
of node b is defined as 



9b 



E 

*¥=3 



C„(i,j) 
C(i,j) 



(6) 



where C(i,j) is the number of all the shortest pathways 
between a pair of nodes and Cb(i,j) is that of the 
shortest pathways running through a node b [2(J. It 
is known that the BC distribution follows a power-law 
p(g) ~ g~ v for scale- free networks (l9l |2C|. 

Similar to the degree distribution, the BC distribution 
of sampled networks also follows power-law well as do 
the original networks. Figure [5] shows the change of the 
BC exponent, also obtained by Eq. (fTJ), for each network 
and each sampling method. Similar to the degree ex- 
ponent case, in general, BC exponents increase for node 
and link sampling and decrease for snowball sampling as 
the sampling fraction gets lower. Figure [6] bears a re- 
semblance to Fig. [2] except for the case of arxiv.org, for 
which the BC exponent seems to be conserved for all the 
sampling methods. The correlation between degree and 
BC of nodes [39j, shown in Fig. [3 could explain the same 
direction of changes of degree and BC exponents. For as- 
sortative networks such as arxiv.org here, however, it is 
known that the degree-BC correlation is not clear [io| . 
which explains the different behavior in Fig.^d). There- 
fore, at least empirically, we expect overestimation of a 
BC exponent by node and link sampling and underesti- 
mation by snowball sampling. 



C. Assortativity 
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FIG. 6: Changes of BC exponent rj for each network's sampled 
networks according to the sampling fraction a, averaged over 
ten realizations. Empty squares (□) stand for node sampling, 
filled squares (■) for link sampling, and empty triangles (A) 
for snowball sampling. The horizontal dashed lines are the 
values for the original exponent of each network, and the other 
lines are guides to the eyes. 
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FIG. 7: Degree and BC of nodes in a sampled network of 
BA network by node sampling. The sampling fraction is 
10000/30000. The value of BC is rescaled by the number 
of nodes. 



The assortativity r, which measures the correlation be- 
tween degrees of node linked to each other, is defined 
as the Pearson correlation coefficient of degrees between 
pairs of nodes [28| . Positive values of r stand for the pos- 
itive degree-degree correlation which means that nodes 
with large degrees tend to be connected to one another. 
Most social networks have this positive degree correla- 
tion (assortative mixing), including the arxiv.org network 
considered in this paper. On the other hand, most bio- 
logical and technological networks show negative degree 
correlation r < {disassortative mixing), including PIN 
and Internet AS network here. If there is no degree corre- 
lation among nodes (neutral) , as in the case of BA model, 
the value of r is in the vicinity of 0. 

The change of assortativity for each network and each 
method is shown in Fig. [5J For node and link sampling, 
no noticeable changes of assortativity in the sampled net- 
works are observed. Random choice of nodes or links ap- 



pears to conserve assortativity well for these two meth- 
ods. Sampled networks from snowball sampling, how- 
ever, are shown to be more disassortative than the origi- 
nal networks. This pattern is common no matter whether 
the original network is assortative (arxiv.org), disassorta- 
tive (PIN and Internet AS), or neutral (BA). In Ref. [Hj], 
a formula for the change of assortativity under the link 
sampling process is presented as follows, 



(7) 



a 



(k 2 )/(k) 1 



where (k n ) is the nth moment of the degree of the orig- 
inal network. Our data fit perfectly well with Eq. 0, 
as shown in Fig. [5] In our datasets, where the degree 
exponent 7 < 4, (fc 3 ) dominates in Eq. {7J and r' ~ r in 
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FIG. 8: Changes of assortativity r for each network's sampled 
networks according to the sampling fraction a, averaged over 
ten realizations. Empty squares (□) stand for node sampling, 
filled squares (■) for link sampling, and empty triangles (A) 
for snowball sampling. The horizontal dashed lines are the 
values for the original assortativity of each network, and the 
other lines are guides to the eyes. 
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FIG. 10: (k nn (k)) distribution for sampled networks of (a) 
PIN, (b) arxiv.org by snowball sampling. 
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FIG. 9: Changes of assortativity r under the link sampling 
for our four datasets, and comparison with Eq. @. 



most cases, which is consistent with our numerical data 
for the link sampling. 

There is another way to check the degree corre- 
lation, which is measuring the quantity (k nn (k)) = 
J2k' k'p(k'\k), i.e., the avera ge d egree of nearest neigh- 
bors of nodes with degree k [42|]. Assortative mixing is 
represented by a positive slope of the (k nn (k)) graph, 
while the others by horizontal (neutral) or a negative 
slope (disassortative) . Figure \W\ shows the changes of 
these slopes for (k nn {k)) graphs of the sampled networks 
from two kinds of original networks by snowball sam- 
pling. The slope decreases, i.e., moves toward the neg- 
ative value as the sampling fraction gets lower for both 
disassortative Internet AS and assortative arxiv.org. 

We suggest that the more disassortative nature of sam- 
pled networks compared with the original ones is due to 
the last layer of snowball sampling method. In contrast 



to the conserved structure of the inner layers, a consid- 
erable number of links are lost for the nodes in the last 
layer. Meanwhile, hubs are likely to be selected for snow- 
ball sampling. This separation of "core" and "periphery" 
part is seen in Fig. [?J and the connections between hubs 
and nodes of the last layer can reduce the value of assor- 
tativity. The simulation shows that a sampled network 
containing the entire last layer is more disassortative than 
the one where only parts of the last layer are kept, which 
supports the hypothesis that the effect of the last layer 
induces disassortative mixing. Therefore, we have to be 
careful when measuring the assortativity for the network 
from the snowball sampling. 



D. Clustering coefficient 

The clustering coefficient Ct of node i is the ratio of the 
total number y of the links connecting its nearest neigh- 
bors to the total number of all possible links between all 
these nearest neighbors [|[, 



Ci = 



2?y 



ki{ki 1) 



(8) 



where ki is the degree of node i. The clustering coefficient 
of a network is the average of this value over all the nodes 
C = J2i Ci/n, where n is the number of nodes. Most real 
networks have much larger value of clustering coefficient 
than model networks such as ER or BA network due to, 
e.g., the community or modular structure. 

In Fig. [HI we show the change of clustering coefficient 
for each original network and each sampling method. For 
node and snowball sampling, there is a little change of 
clustering coefficient depending on networks. On the 
other hand, link sampling prominently reduces the clus- 
tering coefficient. This effect is obvious since the random 
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FIG. 11: Changes of clustering coefficient C for each net- 
work's sampled networks according to the sampling fraction 
a, averaged over ten realizations. Empty squares (□) stand 
for node sampling, filled squares (■) for link sampling, and 
empty triangles (a) for snowball sampling. The horizontal 
dashed lines are the values for the original clustering coeffi- 
cient of each network, and the other lines are guides to the 
eyes. 
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TABLE II: The changes of quantities in networks by each 
sampling method. As the sampling fraction gets lower (JJ. 
at the very right of each sampling method indicates this), ff 
stands for increase, JJ. for decrease, = for the same, and ft for 
depending on networks. 



omission of links, the reverse process of link sampling, 
"opens up triangles fast" as stated in Ref. [?|. The link 
sampling, therefore, underestimates clustering coefficient 
of a network. 



with the BA model, are used as the original networks for 
numerical investigation. We have measured four typi- 
cal quantities in sampled networks, which shows some 
characteristic patterns in changes of the quantities for 
each sampling method. Based on properties of sampling 
methods, possible explanations for such changes as well 
as the mathematical analysis are provided. We have also 
analyzed other networks than the scale-free ones such as 
Erdos-Renyi random network [43| and the growing net- 
work without the preferential attachment [22j |. and the 
results show that the form of the degree distribution is 
conserved for the node and link sampling in those cases, 
consistent with the previous work [17| • 

Table |TT] summarizes the results. To check the gen- 
erality of the results, we also investigated the random- 
ized version of each network in a similar fashion. The 
randomized networks were constructed by shuffling the 
links while conserving only the degree distribution [2e| . 
We found the same results with the original networks. 
The results in Table ITTl therefore, seem to hold for scale- 
free networks in general and provide criteria for sampling 
method when some specific quantity is supposed to be 
investigated by the sampling. From another viewpoint, 
bias of some quantities can be predicted if a specific sam- 
pling method used to identify a network is known. If we 
are interested in the assortativity of a network, for exam- 
ple, node or link sampling can give fairly accurate values. 
For a clustering coefficient, on the other hand, the link 
sampling method should be avoided. 

Sampling problems should be taken into account for 
real network research, but not much work has been done 
so far. Exploration of other characteristics of complex 
networks or using other sampling methods, rigorous an- 
alytic approaches, and establishing solid principles by 
more systematic investigation could all be important re- 
search topics for the future. We hope this work can make 
a contribution to this direction of research. 
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